Skip to content

Add option to replicate attn weights on FSDP or EP#3480

Open
gobbleturk wants to merge 7 commits intomainfrom
mattdavidow-embed-module-sharding
Open

Add option to replicate attn weights on FSDP or EP#3480
gobbleturk wants to merge 7 commits intomainfrom
mattdavidow-embed-module-sharding

Conversation

@gobbleturk
Copy link
Copy Markdown
Collaborator

@gobbleturk gobbleturk commented Mar 21, 2026

Not meant to be submitted.

Gives an easy (CLI) option for how to shard embed (embed-attn) one of three ways:

  • by both FSDP and EP (same as head)
  • by only EP (new default)
  • by only FSDP

We give these options and set only EP as default since it is more performant for large scale moe runs - we don't want to shard the embed dimension 4096 ways, but only EP=64 ways (we use EP since EP is 2D in our best configs)

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown

This PR has been automatically marked as stale because it has not had recent activity. It will be closed soon if no further activity occurs. Thank you for your contributions.

@github-actions github-actions Bot added the stale Automatically applied to stale PRs. label Apr 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale Automatically applied to stale PRs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant